In JoshuaSlocum/model-log: Help you keep track of models you fit

Example workflow

Source necessary packages
Source necessary functions
Load raw data a. Could be a DB query b. Could be a flat file
Clean data a. Treat missing values b. Clean up dates/times c. Put into a "tidy" format
Feature Engineering a. Apply feature engineering functions to data
Modeling a. Build models ....

Design feature_engineering script/function to only apply new features that do not exist yet in data. This is to be more efficient during interactive analysis/R&D (no need to re-create features already attached to the data). But, if someone starts with a fresh session it will be completly reproducible.

Determine if better practice to keep raw and engineerd features separate and join at time of building modeling dataset, or to keep together from start. Separate might be better for creating new features from raw, since you'd have to manipulate a smaller dataset (maybe?) but together keeps things compact as long as we maintain lists of "original" and "engineered" features.

JoshuaSlocum/model-log documentation built on May 7, 2019, 12:04 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com